Search CORE

120 research outputs found

Solving package dependencies: from EDOS to Mancoosi

Author: Treinen Ralf
Zacchiroli Stefano
Publication venue
Publication date: 01/01/2008
Field of study

Mancoosi (Managing the Complexity of the Open Source Infrastructure) is an ongoing research project funded by the European Union for addressing some of the challenges related to the "upgrade problem" of interdependent software components of which Debian packages are prototypical examples. Mancoosi is the natural continuation of the EDOS project which has already contributed tools for distribution-wide quality assurance in Debian and other GNU/Linux distributions. The consortium behind the project consists of several European public and private research institutions as well as some commercial GNU/Linux distributions from Europe and South America. Debian is represented by a small group of Debian Developers who are working in the ranks of the involved universities to drive and integrate back achievements into Debian. This paper presents relevant results from EDOS in dependency management and gives an overview of the Mancoosi project and its objectives, with a particular focus on the prospective benefits for Debian

arXiv.org e-Print Archive

CiteSeerX

Hal-Diderot

Expressing advanced user preferences in component installation

Author: Treinen Ralf
Zacchiroli Stefano
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

State of the art component-based software collections - such as FOSS distributions - are made of up to dozens of thousands components, with complex inter-dependencies and conflicts. Given a particular installation of such a system, each request to alter the set of installed components has potentially (too) many satisfying answers. We present an architecture that allows to express advanced user preferences about package selection in FOSS distributions. The architecture is composed by a distribution-independent format for describing available and installed packages called CUDF (Common Upgradeability Description Format), and a foundational language called MooML to specify optimization criteria. We present the syntax and semantics of CUDF and MooML, and discuss the partial evaluation mechanism of MooML which allows to gain efficiency in package dependency solvers

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal-Diderot

Description of the CUDF Format

Author: Treinen Ralf
Zacchiroli Stefano
Publication venue
Publication date: 01/11/2008
Field of study

This document contains several related specifications, together they describe the document formats related to the solver competition which will be organized by Mancoosi. In particular, this document describes: - DUDF (Distribution Upgradeability Description Format), the document format to be used to submit upgrade problem instances from user machines to a (distribution-specific) database of upgrade problems; - CUDF (Common Upgradeability Description Format), the document format used to encode upgrade problems, abstracting over distribution-specific details. Solvers taking part in the competition will be fed with input in CUDF format

arXiv.org e-Print Archive

Hal-Diderot

Efficient Prior Publication Identification for Open Source Code

Author: Serafini Daniele
Zacchiroli Stefano
Publication venue
Publication date: 22/07/2022
Field of study

Free/Open Source Software (FOSS) enables large-scale reuse of preexisting software components. The main drawback is increased complexity in software supply chain management. A common approach to tame such complexity is automated open source compliance, which consists in automating the verication of adherence to various open source management best practices about license obligation fulllment, vulnerability tracking, software composition analysis, and nearby concerns.We consider the problem of auditing a source code base to determine which of its parts have been published before, which is an important building block of automated open source compliance toolchains. Indeed, if source code allegedly developed in house is recognized as having been previously published elsewhere, alerts should be raised to investigate where it comes from and whether this entails that additional obligations shall be fullled before product shipment.We propose an ecient approach for prior publication identication that relies on a knowledge base of known source code artifacts linked together in a global Merkle direct acyclic graph and a dedicated discovery protocol. We introduce swh-scanner, a source code scanner that realizes the proposed approach in practice using as knowledge base Software Heritage, the largest public archive of source code artifacts. We validate experimentally the proposed approach, showing its eciency in both abstract (number of queries) and concrete terms (wall-clock time), performing benchmarks on 16 845 real-world public code bases of various sizes, from small to very large

arXiv.org e-Print Archive

Towards maintainer script modernization in FOSS distributions

Author: Di Ruscio Davide
Pelliccione Patrizio
Pierantonio Alfonso
Zacchiroli Stefano
Publication venue
Publication date: 24/08/2009
Field of study

Free and Open Source Software (FOSS) distributions are complex software systems, made of thousands packages that evolve rapidly, independently, and without centralized coordination. During packages upgrades, corner case failures can be encountered and are hard to deal with, especially when they are due to misbehaving maintainer scripts: executable code snippets used to finalize package configuration. In this paper we report a software modernization experience, the process of representing existing legacy systems in terms of models, applied to FOSS distributions. We present a process to define meta-models that enable dealing with upgrade failures and help rolling back from them, taking into account maintainer scripts. The process has been applied to widely used FOSS distributions and we report about such experiences

arXiv.org e-Print Archive

Hal-Diderot

Where are your Manners? Sharing Best Community Practices in the Web 2.0

Author: Di Iorio Angelo
Rossi Davide
Vitali Fabio
Zacchiroli Stefano
Publication venue
Publication date: 01/01/2009
Field of study

The Web 2.0 fosters the creation of communities by offering users a wide array of social software tools. While the success of these tools is based on their ability to support different interaction patterns among users by imposing as few limitations as possible, the communities they support are not free of rules (just think about the posting rules in a community forum or the editing rules in a thematic wiki). In this paper we propose a framework for the sharing of best community practices in the form of a (potentially rule-based) annotation layer that can be integrated with existing Web 2.0 community tools (with specific focus on wikis). This solution is characterized by minimal intrusiveness and plays nicely within the open spirit of the Web 2.0 by providing users with behavioral hints rather than by enforcing the strict adherence to a set of rules.Comment: ACM symposium on Applied Computing, Honolulu : \'Etats-Unis d'Am\'erique (2009

arXiv.org e-Print Archive

HAL Descartes

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Hal-Diderot

Gender Differences in Public Code Contributions: a 50-year Perspective

Author: Zacchiroli Stefano
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

International audienceGender imbalance in information technology in general, and Free/Open Source Software specifically, is a well-known problem in the field. Still, little is known yet about the large-scale extent and long-term trends that underpin the phenomenon. We contribute to fill this gap by conducting a longitudinal study of the population of contributors to publicly available software source code. We analyze 1.6 billion commits corresponding to the development history of 120 million projects, contributed by 33 million distinct authors over a period of 50 years. We classify author names by gender and study their evolution over time.We show that, while the amount of commits by female authors remains low overall, there is evidence of a stable long-term increase in their proportion over all contributions, providing hope of a more gender-balanced future for collaborative software development

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Determining the Intrinsic Structure of Public Software Development History

Author: Pietri Antoine
Rousseau Guillaume
Zacchiroli Stefano
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/10/2020
Field of study

Background. Collaborative software development has produced a wealth of version control system (VCS) data that can now be analyzed in full. Little is known about the intrinsic structure of the entire corpus of publicly available VCS as an interconnected graph. Understanding its structure is needed to determine the best approach to analyze it in full and to avoid methodological pitfalls when doing so. Objective. We intend to determine the most salient network topol-ogy properties of public software development history as captured by VCS. We will explore: degree distributions, determining whether they are scale-free or not; distribution of connect component sizes; distribution of shortest path lengths.Method. We will use Software Heritage-which is the largest corpus of public VCS data-compress it using webgraph compression techniques, and analyze it in-memory using classic graph algorithms. Analyses will be performed both on the full graph and on relevant subgraphs. Limitations. The study is exploratory in nature; as such no hypotheses on the findings is stated at this time. Chosen graph algorithms are expected to scale to the corpus size, but it will need to be confirmed experimentally. External validity will depend on how representative Software Heritage is of the software commons.Comment: MSR 2020 - 17th International Conference on Mining Software Repositories, Oct 2020, Seoul, South Kore

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Content-Based Textual File Type Detection at Scale

Author: Del Bonifro Francesca
Gabbrielli Maurizio
Zacchiroli Stefano
Publication venue
Publication date: 01/01/2021
Field of study

Programming language detection is a common need in the analysis of large source code bases. It is supported by a number of existing tools that rely on several features, and most notably file extensions, to determine file types. We consider the problem of accurately detecting the type of files commonly found in software code bases, based solely on textual file content. Doing so is helpful to classify source code that lack file extensions (e.g., code snippets posted on the Web or executable scripts), to avoid misclassifying source code that has been recorded with wrong or uncommon file extensions, and also shed some light on the intrinsic recognizability of source code files. We propose a simple model that (a) use a language-agnostic word tokenizer for textual files, (b) group tokens in 1-/2-grams, (c) build feature vectors based on N-gram frequencies, and (d) use a simple fully connected neural network as classifier. As training set we use textual files extracted from GitHub repositories with at least 1000 stars, using existing file extensions as ground truth. Despite its simplicity the proposed model reaches 85% in our experiments for a relatively high number of recognized classes (more than 130 file types)

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna